3 research outputs found

    All-rounder: A flexible DNN accelerator with diverse data format support

    Full text link
    Recognizing the explosive increase in the use of DNN-based applications, several industrial companies developed a custom ASIC (e.g., Google TPU, IBM RaPiD, Intel NNP-I/NNP-T) and constructed a hyperscale cloud infrastructure with it. The ASIC performs operations of the inference or training process of DNN models which are requested by users. Since the DNN models have different data formats and types of operations, the ASIC needs to support diverse data formats and generality for the operations. However, the conventional ASICs do not fulfill these requirements. To overcome the limitations of it, we propose a flexible DNN accelerator called All-rounder. The accelerator is designed with an area-efficient multiplier supporting multiple precisions of integer and floating point datatypes. In addition, it constitutes a flexibly fusible and fissionable MAC array to support various types of DNN operations efficiently. We implemented the register transfer level (RTL) design using Verilog and synthesized it in 28nm CMOS technology. To examine practical effectiveness of our proposed designs, we designed two multiply units and three state-of-the-art DNN accelerators. We compare our multiplier with the multiply units and perform architectural evaluation on performance and energy efficiency with eight real-world DNN models. Furthermore, we compare benefits of the All-rounder accelerator to a high-end GPU card, i.e., NVIDIA GeForce RTX30390. The proposed All-rounder accelerator universally has speedup and high energy efficiency in various DNN benchmarks than the baselines

    ํฌ์†Œ ํ–‰๋ ฌ ๊ณฑ์…ˆ ๊ฐ€์†๊ธฐ์˜ ํ•˜๋“œ์›จ์–ด ํ–ฅ์ƒ์— ๋Œ€ํ•˜์—ฌ

    No full text
    SpGEMM, Distribution Network, Reduction Network, Data tilingDeep learning is being used and researched in various industries such as image processing, natural lan-guage processing, and recommendation algorithm service. Also, The size of the model is growing in tandem with deep learning technologies to increase accuracy. Additionally, sparse matrix multiplication is used in the majority of deep learning model operations. As a result, there is an increasing needs for accelerator research on sparse matrix multiplication. One of the accelerators that supports the sparse general matrix-matrix multi-plication (spGEMM) operation is SIGMA (A Sparse and Irregular GEMM Accelerator). However, each opera-tion network and index matching process of SIGMA is inefficient. We propose improvement measures in three aspects to solve these problems. First, the distribution network's redundant hardware modules are elimi-nated. When multiple flexdpe's are controlled by NoC (Network on chip), area and power can be by utilizing the use of a network where unnecessary parts are removed. Second, we suggest a brand-new architecture that solely uses the output flip-flop to store and compute the total of the partial sums of reduction networks. Fi-nally, we suggest that for quick operation processing, the sparsity of each matrix, the number of operation elements, and the matrix size be used as indicators for choosing an efficient partitioning approach utilizing a pre-calculated table as a look-up table. The total hardware area was decreased by roughly 21.8% and the power was decreased by 37.5% thanks to the proposed distribution and reduction network structure en-hancement. When using the LUT and tiling with 2, it is possible to reduce the clock cycle by around 80% when the stationary matrix's sparseness is 80% and the streaming matrix's sparseness is 99%.๋”ฅ ๋Ÿฌ๋‹(Deep Learning)์€ ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ, ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ, ์ถ”์ฒœ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„œ๋น„์Šค ๋“ฑ ๋‹ค์–‘ํ•œ ์‚ฐ์—… ๋ถ„์•ผ์—์„œ ํ™œ์šฉ ๋ฐ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ๋‹ค. ๋˜ํ•œ ๋”ฅ ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ์ •ํ™•๋„ ํ–ฅ์ƒ์„ ์œ„ํ•ด ๋ชจ๋ธ์ด ํฌ๊ธฐ๋„ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ๋”ฅ ๋Ÿฌ๋‹ ๋ชจ๋ธ์—์„œ ๋Œ€๋ถ€๋ถ„์˜ ์—ฐ์‚ฐ์€ ํฌ์†Œ ํ–‰๋ ฌ ๊ณฑ์…ˆ์ด ์ฐจ์ง€ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ํฌ์†Œํ–‰๋ ฌ ๊ณฑ์…ˆ์— ๋Œ€ํ•œ ๊ฐ€์†๊ธฐ ์—ฐ๊ตฌ์˜ ํ•„์š”์„ฑ์ด ์ปค์ง€๊ณ  ์žˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ํฌ์†Œํ–‰๋ ฌ ๊ณฑ์…ˆ ์—ฐ์‚ฐ์„ ์ง€์›ํ•˜๋Š” ๊ฐ€์†๊ธฐ ์ค‘ ํ•˜๋‚˜์ธ SIGMA(A Sparse and Irregular GEMM Accelerator)์˜ 3๊ฐ€์ง€ ์ธก๋ฉด์—์„œ ๊ฐœ์„ ๋ฐฉ์•ˆ์„ ์ œ์‹œํ•œ๋‹ค. ์ฒซ์งธ, distribution network์˜ ๋ถˆํ•„์š”ํ•œ ํ•˜๋“œ์›จ์–ด ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ์ œ๊ฑฐํ•œ๋‹ค. ๋‘˜์งธ, reduction network ์˜ partial sum์˜ ํ•ฉ์„ output flip-flop๋งŒ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ €์žฅํ•˜๊ณ  ์—ฐ์‚ฐํ•˜๋Š” ์ƒˆ๋กœ์šด topology ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋น ๋ฅธ ์—ฐ์‚ฐ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด ๊ฐ ํ–‰๋ ฌ์˜ ํฌ์†Œ๋„, ์—ฐ์‚ฐ ์š”์†Œ ๊ฐœ์ˆ˜, ํ–‰๋ ฌ ํฌ๊ธฐ๋ฅผ ์ง€ํ‘œ๋กœ ํ•˜์—ฌ ๋ฏธ๋ฆฌ ๊ณ„์‚ฐํ•œ table์„ Look Up Table๋กœ ํ™œ์šฉํ•˜์—ฌ ํšจ์œจ์ ์ธ ๋ถ„ํ•  ๋ฐฉ๋ฒ•์„ ์„ ํƒํ•˜๋Š” ๊ฒƒ์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•œ distri-bution, reduction network ๊ตฌ์กฐ ๊ฐœ์„ ์„ ํ†ตํ•ด ์ „์ฒด ํ•˜๋“œ์›จ์–ด ๋ฉด์ ์€ ์•ฝ 21.8% ๊ฐ€ ๊ฐ์†Œํ•˜์˜€๊ณ  ์ „๋ ฅ์€ 37.5%๊ฐ€ ๊ฐ์†Œํ•˜์˜€๋‹ค. Stationary matrix์˜ ํฌ์†Œ๋„๊ฐ€ 80%, streaming matrix์˜ ํฌ์†Œ๋„๊ฐ€ 99%์ผ ๋•Œ LUT๋ฅผ ๋ณด๊ณ  2๋กœ tilingํ•  ๊ฒฝ์šฐ ์•ฝ 80%์˜ clock cycle์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.โ… . Introduction 1 โ…ก. Background and Prior Work 4 2.1 Background 4 2.1.1 Multi-layer Perceptron (MLP) 4 2.1.2 Convolutional Neural Networks (CNN) 5 2.1.3 Transformer 6 2.2 Prior Works: Inner, Outer, Row-wise Product Based Accelerators 7 2.3 Prior Work: SIGMA 8 2.3.1 Dataflow Of SIGMA 8 2.3.2 Distribution Network 11 2.3.3 Reduction Network 12 โ…ข. Proposed Sparse Accelerator Design 13 3.1 Distribution Network 13 3.2 Reduction Network 15 3.2.1 Reorganized adder tree 15 3.3 Data Tiling Strategy 16 โ…ฃ. Evaluation 17 4.1 Methodology 17 4.2 Experimental Results 17 4.2.1 Area / Power Improvements 17 4.2.2 Performance Improvements 18 โ…ค. Conclusion 19 References 20MasterdCollectio

    Universal primers for rift valley fever virus whole-genome sequencing

    No full text
    Rift Valley fever (RVF) is a mosquito-borne zoonotic disease causing acute hemorrhagic fever. Accurate identification of mutations and phylogenetic characterization of RVF virus (RVFV) require whole-genome analysis. Universal primers to amplify the entire RVFV genome from clinical samples with low copy numbers are currently unavailable. Thus, we aimed to develop universal primers applicable for all known RVFV strains. Based on the genome sequences available from public databases, we designed eight pairs of universal PCR primers covering the entire RVFV genome. To evaluate primer universality, four RVFV strains (ZH548, Kenya 56 (IB8), BIME-01, and Lunyo), encompassing viral phylogenetic diversity, were chosen. The nucleic acids of the test strains were chemically synthesized or extracted via cell culture. These RNAs were evaluated using the PCR primers, resulting in successful amplification with expected sizes (0.8โ€“1.7ย kb). Sequencing confirmed that the products covered the entire genome of the RVFV strains tested. Primer specificity was confirmed via in silico comparison against all non-redundant nucleotide sequences using the BLASTn alignment tool in the NCBI database. To assess the clinical applicability of the primers, mock clinical specimens containing human and RVFV RNAs were prepared. The entire RVFV genome was successfully amplified and sequenced at a viral concentration of 108 copies/mL. Given the universality, specificity, and clinical applicability of the primers, we anticipate that the RVFV universal primer pairs and the developed method will aid in RVFV phylogenomics and mutation detection. ยฉ 2023, The Author(s).11Nsciescopu
    corecore